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Abstract A program invariant is a property that holds for every execu- 
tion of the program. Recent work suggest to infer likely-only invariants, 
via dynamic analysis. A likely invariant is a property that holds for some 
executions but is not guaranteed to hold for all executions. In this pa- 
per, we present work in progress addressing the challenging problem of 
automatically verifying that likely invariants are actual invariants. We 
propose a constraint-based reasoning approach that is able, unlike other 
approaches, to both prove or disprove likely invariants. In the latter case, 
our approach provides counter-examples. We illustrate the approach on a 
motivating example where automatically generated likely invariants are 
verified. 



1 Introduction 

A program invariant is a property that holds over every execution of the 
program. Examples of program invariants include loop invariants pre- 
sented by Hoare in the weakest precondition calculus 12 or pre-post con- 
ditions of the design by contracts approach ^1] . Invariants have proved 
to be crucial in various fields of software engineering such as specifica- 
tion refinement, software evolution or software verification. Unfortunately, 
writing invariants is a tedious task and few programmers write program 
invariants by themselves. 

In order to palliate this problem, a trend of research aims at inferring 
invariants a posteriori. In this case, invariants correspond to the actual 
behaviors of programs, not to their intended behaviors. 

A common approach is to use static analysis, which infers invariants 
from the source code. For example, abstract interpretation-based analyses 
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generate different kinds of invariants, depending on the abstract domain 
used : intervals [3], polyhedra [I] or octagons ^5], to name a few. These 
methods generate sound invariants but the abstractions used to address 
problems of termination and complexity may lead to a weak accuracy 

Ernst et al. introduced Daikon, a tool performing dynamic inference 
of properties using actual values computed during program executions 
6 . The advantage is that the generated properties are in general more 
precise than those generated with a static inference. The drawback of 
this method is that the properties may not hold for particular executions. 
They are therefore likely only invariants. Proving likely invariants to be 
correct would make them sound, while being in general more precise than 
statically inferred invariants. 

In this paper, we present work in progress regarding a constraint-based 
approach to verify likely invariants by refutation. We have restrained the 
presentation to the validation of likely invariants generated by Daikon. 
Nevertheless, others likely invariants can be checked with this approach. 
For example, a user could test his program against properties that he 
knows it is supposed to have. The idea of the approach is, firstly, to gener- 
ate a constraint system, CS, modeling an imperative program. To do this, 
we use the translation of an imperative program into CLP(FD) presented 
by Gotlieb et al. [Sj, which has already proved to be useful in structural 
testing This transformation can deal with a large subset of C/C+- 1- 
language, including floating point numbers and a restricted class of 
pointers Then, we transform a likely invariant into a constraint I. 
Finally, we try to find a solution of the constraint system CS A —>I : if the 
constraint solver finds a solution, then the likely invariant is spurious. If 
the solver finds that there is no solution, then the likely invariant is an in- 
variant. Unfortunately, the resolution might not terminate or might take 
too long. In these cases, nothing can be concluded. From a declarative 
point of view, this approach is very similar to the verification of program 
based on Horn Logic Denotations The difference is the use of con- 
straint logic to express the semantics of an imperative language instead 
of pure Horn logic. When running the verification, Horn logic leads to a 
generate-and-test method, whereas constraint logic leads to a propagate- 
generate-and-test method. We expect our approach to be more efficient 
because the propagation should reduce the number of test cases. 

The different steps of our approach are detailed on a motivating ex- 
ample. Three likely invariants are generated by a dynamic inference. By 
applying the method presented here, two of them are disproved and the 
other is proved. 
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The contribution of the approach, as illustrated on our example, is to 
be able to both prove or disprove some likely invariants. In the literature, 
similar techniques are dedicated to either one or the other. Jackson and 
Vaziri use a constraint solving-based approach that only allows them to 
disprove likely invariants ^H]- Nimmer and Ernst present an experiment 
to prove the correctness of likely invariants using the static checker ESC- 
Java |2ll(i| . When ESC- Java fails to prove a likely invariant, it might be 
due to the lack of an assertion or precondition rather than to an actual 
error. Because of this point, ESC-Java cannot disprove spurious likely 
invariants. 

Section 2 briefly describes the work of Ernst et al. on dynamic in- 
ference of likely invariants. Section 3 presents our motivating example. 
The dynamic analysis of Ernst et al. is used to infer invariants on this 
program. Section 4 summarizes the translation of an imperative program 
into a constraint system. Section 5 illustrates how we suggest to refute 
or prove a likely invariant using constraint solving. Section 6 discusses 
difficulties encountered with our approach. Finally, section 7 concludes 
the paper. 

2 Dynamic inference of invariants 

This section briefly describes the seminal work of Ernst et al. on dynamic 
inference of likely invariants [0] • 

Previous work about the inference of program invariants used static 
analyses. Results of such analyses are sound, which is very important 
for program invariants. The counterpart is that the approximations and 
complex algorithms required to achieve soundness may lead to a weak 
accuracy. 

Ernst et al. propose a compromise where the soundness of the results 
is not guaranteed in order to gain accuracy. They use dynamic analyses 
that compute likely invariants from data collected during executions. The 
underlying idea is that, if a property holds over many executions, then it 
has good chances to be an invariant. 

Daikon is a tool that implements the dynamic inference of likely invari- 
ants in four steps. Firstly, the program is instrumented to automatically 
trace values of variables of interest during execution. Secondly, a test 
suite is executed on this new program. The data collected during these 
executions are stored in a database. Thirdly, the set of potential likely in- 
variants is generated. Daikon uses a pool of relationships to automatically 
generate all potential invariants between variables that can be compared. 
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int foo (int n, int r){ 
int s = 0; 
while (n > 0) { 



n ; 



! 

if (a == 0){ 
a = l; 
r + +; 



} 

else { 

s = 0; 



r ; 



} 



} 

return r; 
} 



Figurel. A toy example : the foo program 



Comparability between variables is discussed in [7j. Examples of possible 
invariants are equalities with a constant (e.g. x = a), non-linear rela- 
tionships among variables (e.g. z = gcd(x,y)) or ordering relationships 
between variables (e.g. x > y). Additional relationships involving at most 
three variables are trivial to add. Finally, the set of possible invariants 
so-generated is checked against the execution data stored in the database. 
Possible invariants that are not falsified during this checking are reported 
to be likely invariants. 

In practice, the complexity of the Daikon algorithm tends to be pro- 
portional to the number of detected invariants. A lot of research is done 
around Daikon to improve the efficiency and accuracy of the inference. 

3 Running example 

This section presents an example of dynamic inference of likely invariants 
as presented in sectional Figure^shows the foo C program. This program 
takes two input values : n and r. It returns r if n is negative. Else, it 
returns r if n is even and r + 1 if n is odd. 

We have used Daikon, the tool presented in section [21 to infer likely 
program invariants of the foo program. We used an all-branch covering 
test suite of 25 test cases. In these test cases, the loop is unfolded from 
to 454 times. With this test suite, the inference configured in the default 
mode resulted in three likely invariants at the exit point of the program : 

1. orig(r) = return = 
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2. return = ==> orig(r) = 

3. return > orig(r) 

In these likely invariants, orig(r) corresponds to the value of variable r 
at the entry point of the program and return is the value returned by the 
program. These likely invariants are not trivial, as they represent a partial 
specification of a loop. In particular, likely invariant 01 is complicated to 
infer statically. Indeed, it requires to detect that the executed branch of 
the conditional alternates at each loop unfolding in such a way that the 
value of r cannot become lower than orig(r). Likely invariants n an d|U are 
also difficult to infer as they can be seen as a disjunction of two properties. 
For example, likely invariant ^ is actually orig(r) ^ V return = 0. 

4 Translation of an imperative program into a constraint 
system 

This section describes the first step of our approach to validate likely in- 
variants, namely translating an imperative program into constraint logic 
programming on finite domains (CLP(FD)). More details about the trans- 
formation can be found in jSJ. 

CLP(FD) is an extension of logic programming. In CLP(FD) pro- 
grams, logical variables are assigned a domain and relations between vari- 
ables are described with constraints. A solution to a CLP(FD) program 
is a valuation of every variable in its own domain such that no constraint 
is falsified. Solutions are find using two mechanisms : propagation and 
enumeration. Propagation uses domain information of each variable to 
reduce domains of other variables. When no more propagation can be 
done, enumeration, also called labeling, assigns values to variables to find 
a solution. Note that each time a variable is assigned a value, a new 
propagation phase takes this new information into account. 

The goal of the transformation described in the following is to gener- 
ate a CLP(FD) constraint between the input and output variables of an 
imperative program. Values for which this constraint is satisfied are those 
who correspond to an existing execution of the program. More formally, if 
In is the list of input variables of the program and Out the list of output 
variables, a constraint clp-prog(In, Out) is generated. If the pair (/, O) 
is a solution of clpjprog then the execution of the original program on 
inputs I returns values O. 

The translation uses the SSA-form as an intermediary form of the 
program. The instructions of the intermediary program are transformed 
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into constraints. In particular, specific operators are designed to deal with 
control structures. 

4.1 The SSA-form 

The SSA-form is an intermediate representation of imperative programs 
which prepares the translation into CLP(FD). It has originally been pre- 
sented by Cytron et al. to optimize compilers 0. The SSA form is a 
semantically equivalent version of a program where each variable has a 
unique definition and every use of this variable is reached by the defini- 
tion. 

The SSA-form is relevant here because logical variables in CLP(FD) 
programs can be assigned only once whereas, in imperative programs not 
in SSA-form, variables can be assigned many times. 

Every program can be transformed into SSA by renaming the uses and 
definitions of the variables. For example i = i + 1; j = j * i is transformed 
into %2 = + J*2 = ji*i2- At the junction nodes of the control structures, 
SSA introduces special assignments , called functions, to merge several 
definitions of the same variable : V2 = 4>{vo, v\) assigns the values of vq i n 
V2 if the flow comes from the first branch of the decision, v\ otherwise. In 
the case of conditional structures, vq and v\ are respectively the vectors 
of defined variables in the then and else branches. V2 is the vector of these 
variables out of the conditional structure. Depending on the validity of 
the condition, V2 = v$ or V2 = v%. 

4.2 Instructions as CLP(FD) constraints 

The instructions of the original program are transformed into constraints 
between logical variables. Type declarations are translated into domain 
constraints. For example, the declaration of a signed integer x is translated 
into : X £ —2 ..2 — 1 where A" is a logical FD_variable. 

Assignments and decisions are translated into arithmetical con- 
straints. For example, assignment x = x + 1 is converted into the SSA 
form X2 = x\ + 1 and further translated into X2 = X\ + 1 where Xi,X% 
are logical FD_variables. 

The main difficulty is to transform control structures into constraints. 
As described in the following, two specific operators are used. 

Conditional statements The conditional statement is treated with a 
specific combinator ite/6. Arguments of ite/6 are the variables that 
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appear in the (^-functions and the constraints generated from the different 
parts of the original conditional statement. Note that other combinators 
may be nested into the arguments of ite/6. The SSA if_else statement : 

if (exp) {stmt} else {stmt} v 2 = 4>(vo,vi) 

is translated into ite(C Co nd, vo, vi, V2, C T hen, C E ise) where C Con d is a 
constraint generated by the analysis of exp and CThen (resp. CElse) is a 
set of constraints generated by the analysis of the then branch (resp. else 
branch) . 

The combinator ite/6 is defined as : 
Definition 1 ite/6 

ite (Ccondi v 0, V\, V 2 , CThen, CfiZse) : _ 

Ccond ► CThen A V 2 = Vq, 

^Ccond > C E lse A V 2 = Vi , 

^{Ccond A C T hen A V 2 = Vo) > ^C C ond A C Else A V 2 = V\, 

-'(-'Ccond A C E lse /\V~2 = V\) ► C C ond A C T hen A^ = WO, 

(Ccond A C T hen A V2 = Vo) Y {^C C ond A C Else A -u| = V\). 

This definition uses guarded- constraints. A guarded-constraint 
head — ► tail rewrites into tail if the constraint head is entailed by 
the constraint store. The first two guarded-constraints straightforwardly 
result from the operational semantics of the iflelse statement whereas 
the third and the fourth correspond to a backward reasoning. In this 
case, values of v% are used to deduce information concerning the flow. 
The last constraint contains the constructive disjunction operator Y. This 
operator removes from the domains of the variables the values that are 
removed whatever the executed part of the disjunction is. For example, 
if the constraint ite(..., [Xq\ , [Xi] , [X 2 ] , Xq = 1,X\ = 3) stands, the 
constructive disjunction operator deduces that X 2 G {1,3}. 

Iterative statements The SSA while statement 

V2 = <j)(vo,vi) while(exp) {stmt} 

is treated with the recursive specific combinator 
w(Ccw, vo, vt , V2, C Bo d y ) where C Con d is a constraint generated by 
the analysis of exp and Csody is a set of constraints generated by the 
analysis of stmt. 
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Definition 2 w/5 

u(Ccond, Vo, Vi, V2,CBody) '■ ~ 

Ccond > (C B ody A u(C' Cond , vt, V3, V^, C' Body )), 

^Ccond ► V2 = Vo, 

^(Ccond A CBody) ► {~^C C ond A V 2 = V ) , 

^(^Cc'ond A Vq = V2) ► (Ccond A Csody^ 

Note that combinator w/5 is dynamic : new variables and new con- 
straints are generated during its evaluation. In particular, the vector V3 is 
a vector of fresh variables. The first and the last guarded constraints both 
make a recursive call to w. The parameters of this new w are not Ccond 
and Csody but new constraints C' Cond and C' Body where some variables 
have been substituted by variables of v\ and i>3 to model the fact that 
the loop has already been entered once. 

The first two guarded-constraints are deduced from the operational 
semantics of the while statement. The third constraint tells that, if the 
constraints extracted from the body are proved to be contradictory with 
the current constraint system then the loop cannot be entered. The last 
constraint models the fact that, if any variable possesses distinct values 
before and after the execution of the while statement, then the loop must 
be entered at least once. 

4.3 Translation of the foo program into constraints 

This section presents the translation of the foo program of Figure ^ into 

a constraint system. By applying the translation described above, the 

constraint system presented in Figure 12 is generated. 

For the sake of clarity, we omit the translation into SSA-form. That is 

why the constraint system presented on Figure [2] does not explicitly show 

all the SSA-names. In fact, the variable names that are in the parameters 

of the w and ite operators must be considered only as syntactical names. 

Depending on the cases, these names are replaced by logical variables 

— > > > > > > 

that are in the vectors V id,V n ew,Vfinai,Vthen,Veise or Vfj te - Constraints 

that correspond to the type declarations of variables are also omitted. 

As the transformation faithfully models the operational semantics of 

C programs, the constraint system can be executed just like the original 

C program. For example, if we instantiate No to 5 and Rq to 3, constraint 

propagation leads to the instantiation of RET to 4, which is the result of 

the original program on the same entries. 
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int foo (int n, int r){ 
int s — 0; 
while (n > 0) { 

n ; 

if (s == 0){ 
8 = 1; 
r + +; 

} 

else { 

s = 0; 
r ; 

} 

} 

return r; 
} 



foo([N ,R },[RET]): 
8o = 0, 



w(n > 0,Vold,Vnew,Vfi„ a l, 

\n — n — 1, 



ite(s = 0, VtfcenjKsisejVfjte 

[s = l,r = r + 1], 
[ a = 0,r = r-l])]), 

RET = R final ■ 



Figure2. Translation of the /oo program into a constraint system 



5 Validation of likely invariants 

In this section, we informally introduce a method to prove or disprove 
likely invariants. Section 15.11 explains how we transform the problem of 
invariant validation into a constraint satisfaction problem and Section T5. 21 
illustrates the behavior of constraint solvers for the running example. 

5.1 A constraint solving problem 

Section |3] presented a model of an imperative program as a constraint 
system. This constraint system, denoted by CS, is a relation between 
the input variables and the output variables. If (X, Y) is a solution of 
CS, X and Y being respectively input and output values, then there 
exists a finite execution of the original program starting with input X 
and returning Y. 

A likely invariant, denoted by /, can be seen as one more constraint. 
This new constraint should be implied by CS if / really is an invariant. 
We want to prove 

CS^I 

Such a proof can be established by refutation using constraint solving : 



CS t= I o Sol(CS A -i I) = 
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In this equation, Sol(CS A -iJ) denotes the set of solutions of the 
constraint system CS A ->I. 

When solving the refutation request CS A ->I, there are three cases : 

1. there exists a solution (X, Y), which means that the execution starting 
from X and terminating in Y does not verify the likely invariant I. 
Thus, I is spurious and (X, Y) is a counter-example. 

2. there is no solution to this problem. It means that I really is an 
invariant. 

3. the user runs out of patience. It can be due either to a too long compu- 
tation or a non-terminating computation. Nothing can be concluded. 

As already mentioned in the introduction, the method presented by 
Nimmer and Ernst ^Hj can prove that a program verifies a likely invariant. 
However, if no proof can be established, it might be due to the fact that 
there is not enough axioms. For example, loop invariants must be provided 
by users in order to soundly prove properties j^j . On the contrary, in the 
work of Jackson and Vaziri it is possible to find a counter-example 
that does not verify the property. However, if none can be found, it can 
be due either to the fact that the likely invariant is indeed an invariant 
or to the inaccuracy of the under-approximation. For example, as the 
number of loop unfoldings is bounded by a value k, there might exists a 
counter-example that unfolds k + 1 times a loop. 

In other words, at the question does the program verify the property ?, 
Nimmer and Ernst answer "yes" or "maybe", Jackson and Vaziri answer 
"no" or "maybe" and our method answers "yes" , "no" or "maybe" . 

5.2 Validation of the invariants of the running example 

In this section, we illustrate our approach on the running example. The 
first likely invariant inferred by Daikon for the foo program is 



As explained in the previous paragraph, the first step of the validation 
consists in adding the negation of the likely invariant to the program. The 
request sent to the solver is therefore 




return = 0. 



foo{[N ,R },RET),R = 0,RET \ = 0. 



(1) 



After propagation the solver answers : 



iV G [inf, sup], RET e [in/, -1] U [1, sup],R = (2) 
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The propagation alone did not allow the solver to find inconsistencies in 
the constraint system. Nothing can be deduced concerning the invariant 
unless concrete values for No and RET are found. An enumeration step on 
variables Nq and RET must be done. Note that variables need to have 
a domain for labeling. As the logical variables correspond to integers 
in the original imperative program, their bounds are MIN_INT and 
MAXJNT. The request is now : 

: -domain{[N , RET] , MINJNT, MAXJNT), foo([N , R ],RET), 

Ro = 0, RET \ = 0, labeling([N ,RET]). (3) 

After propagation and enumeration, the solver finds a solution 

N = 1,R = 0,RET =1. (4) 

It means that the execution of the original program with input 
n = l,r = returns ret = 1. This execution is a counter-example of 
the likely invariant orig(r) = => return = 0. It is therefore disproved. 

The second likely invariant inferred by Daikon for the foo program is 

return = orig(r) = 0. 

In the same way as above, a counter-example is found : 

n = l,r = —1, return = 0. 

The second likely invariant is therefore also disproved. 

The third likely invariant inferred by Daikon for the foo program is 

return > orig(r). 

Repeating the operations previously detailed, the following request is sent 
to the constraint solver : 

: -foo([N Q ,Ro],RET), 

R > RET. (5) 

This time, without any enumeration, the constraint solver answers "no", 
meaning that there is no solution to this problem. The third likely invari- 
ant is therefore proved to be an invariant. 
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The behavior of the w operator on the latter refutation is as follows. 
Initially, the w operator is instantiated to 

w(iV > 0,[R ,N ,So],[Ri,N 1 ,S 1 },[RET,N 2 ,S 2 ],CBody) 

We have not expanded the constraint system of the body for readability 
reasons. The fourth guarded constraint of the w operator instantiated for 
the f oo program is logically equivalent to what follows. 



iV > V (Rq + RET) — » (JV > A C Bo dyA 

v(Ni > 0, [22i,JVi,5i], [R 3 ,N 3 ,S 3 ], [RET,N 2 ,S 2 },C' Body ))). 

As Rq > RET (constraint [HJ , it is impossible for Rq to be equal to 
RET. The guard of the previous constraint is entailed. The loop must 
therefore be entered and constraints of Csody are set up 

N 1 =N - 1. (6) 

As So = (first constraint of the foo program), the ite operator set up 
constraints corresponding to the then branch 

S! = 1 (7) 

Ri = R + 1 (8) 
Due to constraints and |S1 the following property is true 

Ri > Ro > RET, (9) 

therefore, it is impossible to have i?i = RET. Consequently, the loop 
is unfold again. Values [-R3, ./V3, S3] are constrained by clones of con- 
straints IHl and |HJ The same reasoning applies until propagation deduces 
that n cannot be greater than 0. At the beginning, n is in the interval 
[M IN J NT, MAX J NT) so after MAX J NT iterations n is in the in- 
terval [MIN-INT, 0] because of constraint El and all its clones. Thus, 

N MAX J NT < (10) 

At this point, Rmaxjnt > RET. The second guarded-constraint of the 
w operator instantiated for the foo program is : 

-.JV fc > — ► [R k , N k , S k ] = [RET, N 2 , S 2 ] 
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When k = MAX_INT, the guard is entailed because of constraint EH 
Consequently, the constraint 

RET = Rmaxjnt (11) 

is set up. It makes the constraint store unsatisfiable, and this is detected 
by the constraint solver. As a consequence, the third invariant is proved 
to be true. 

6 Discussion 

The previous section presented three examples of validation of likely in- 
variants by constraint solving. Two likely invariants were disproved by 
the exhibition of a counter-example. The last one was proved to be an 
invariant. 

A point that we have not developed yet is the case where the resolution 
does not terminate or is too long. There are two main reasons why these 
cases can happen. The first reason is due to the loops. Indeed, as the model 
we use describes the operational semantics of a program, if the original 
program does not terminate, then the resolution will not terminate. 

The second reason is a problem of propagation in the constraint sys- 
tem. As presented in section |1J the operators ite and w are defined via 
guarded-constraints. Consequently, if the entailment of none of the guards 
can be deduced from the current store of constraints, then the resolution 
of the constraint system suspends. The problem is that our system is very 
specific and usual methods of entailment-checking are inefficient in this 
context : domains of variables are very large, constraint store is dynamic 
and constraints used can be non-linear. 

The consequence of this lack of propagation is that, in bad cases, 
almost all the possible values of input variables will have to be enumer- 
ated to prove or disprove likely invariants. In such a case, our approach 
becomes a generate-and-test method, which is intractable when the do- 
mains of input variables are large. Future work will consist in improving 
the propagation inside our specific constraint system. 

7 Conclusion 

In this paper, we have presented an approach to verify the correctness of 
likely invariants using constraint solving. We have illustrated its principles 
on a toy example. 
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The originality of this method is that some likely invariants are dis- 
proved and others are proved. This differs from other methods that 
are dedicated to only one of these capabilities. Methods using under- 
approximations can only disprove likely invariants whereas methods us- 
ing over-approximation can only prove likely invariants. We are not using 
any approximation, it allows us to prove and disprove but prevents us 
to guarantee termination and good performances. Consequently, the key 
point of our approach is to have a good propagation inside the constraint 
system in order to reduce as much as possible the number of cases where 
we cannot conclude. 
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